9 research outputs found
A Moment-Matching Approach to Testable Learning and a New Characterization of Rademacher Complexity
A remarkable recent paper by Rubinfeld and Vasilyan (2022) initiated the
study of \emph{testable learning}, where the goal is to replace hard-to-verify
distributional assumptions (such as Gaussianity) with efficiently testable ones
and to require that the learner succeed whenever the unknown distribution
passes the corresponding test. In this model, they gave an efficient algorithm
for learning halfspaces under testable assumptions that are provably satisfied
by Gaussians.
In this paper we give a powerful new approach for developing algorithms for
testable learning using tools from moment matching and metric distances in
probability. We obtain efficient testable learners for any concept class that
admits low-degree \emph{sandwiching polynomials}, capturing most important
examples for which we have ordinary agnostic learners. We recover the results
of Rubinfeld and Vasilyan as a corollary of our techniques while achieving
improved, near-optimal sample complexity bounds for a broad range of concept
classes and distributions.
Surprisingly, we show that the information-theoretic sample complexity of
testable learning is tightly characterized by the Rademacher complexity of the
concept class, one of the most well-studied measures in statistical learning
theory. In particular, uniform convergence is necessary and sufficient for
testable learning. This leads to a fundamental separation from (ordinary)
distribution-specific agnostic learning, where uniform convergence is
sufficient but not necessary.Comment: 34 page
An Efficient Tester-Learner for Halfspaces
We give the first efficient algorithm for learning halfspaces in the testable
learning model recently defined by Rubinfeld and Vasilyan (2023). In this
model, a learner certifies that the accuracy of its output hypothesis is near
optimal whenever the training set passes an associated test, and training sets
drawn from some target distribution -- e.g., the Gaussian -- must pass the
test. This model is more challenging than distribution-specific agnostic or
Massart noise models where the learner is allowed to fail arbitrarily if the
distributional assumption does not hold.
We consider the setting where the target distribution is Gaussian (or more
generally any strongly log-concave distribution) in dimensions and the
noise model is either Massart or adversarial (agnostic). For Massart noise, our
tester-learner runs in polynomial time and outputs a hypothesis with
(information-theoretically optimal) error for any
strongly log-concave target distribution. For adversarial noise, our
tester-learner obtains error in polynomial time
when the target distribution is Gaussian; for strongly log-concave
distributions, we obtain in
quasipolynomial time.
Prior work on testable learning ignores the labels in the training set and
checks that the empirical moments of the covariates are close to the moments of
the base distribution. Here we develop new tests of independent interest that
make critical use of the labels and combine them with the moment-matching
approach of Gollakota et al. (2023). This enables us to simulate a variant of
the algorithm of Diakonikolas et al. (2020) for learning noisy halfspaces using
nonconvex SGD but in the testable learning setting.Comment: 26 pages, 3 figures, Version v2: strengthened the agnostic guarante
Ambient Diffusion: Learning Clean Distributions from Corrupted Data
We present the first diffusion-based framework that can learn an unknown
distribution using only highly-corrupted samples. This problem arises in
scientific applications where access to uncorrupted samples is impossible or
expensive to acquire. Another benefit of our approach is the ability to train
generative models that are less likely to memorize individual training samples
since they never observe clean training data. Our main idea is to introduce
additional measurement distortion during the diffusion process and require the
model to predict the original corrupted image from the further corrupted image.
We prove that our method leads to models that learn the conditional expectation
of the full uncorrupted image given this additional measurement corruption.
This holds for any corruption process that satisfies some technical conditions
(and in particular includes inpainting and compressed sensing). We train models
on standard benchmarks (CelebA, CIFAR-10 and AFHQ) and show that we can learn
the distribution even when all the training samples have of their pixels
missing. We also show that we can finetune foundation models on small corrupted
datasets (e.g. MRI scans with block corruptions) and learn the clean
distribution without memorizing the training set.Comment: 24 pages, 11 figure
The Polynomial Method is Universal for Distribution-Free Correlational SQ Learning
We consider the problem of distribution-free learning for Boolean function
classes in the PAC and agnostic models. Generalizing a recent beautiful work of
Malach and Shalev-Shwartz (2020) who gave the first tight correlational SQ
(CSQ) lower bounds for learning DNF formulas, we show that lower bounds on the
threshold or approximate degree of any function class directly imply CSQ lower
bounds for PAC or agnostic learning respectively. These match corresponding
positive results using upper bounds on the threshold or approximate degree in
the SQ model for PAC or agnostic learning. Many of these results were implicit
in earlier works of Feldman and Sherstov.Comment: We were informed that many of these results can be obtained by
combining prior works due to Feldman and Shersto
Hardness of Noise-Free Learning for Two-Hidden-Layer Neural Networks
We give superpolynomial statistical query (SQ) lower bounds for learning
two-hidden-layer ReLU networks with respect to Gaussian inputs in the standard
(noise-free) model. No general SQ lower bounds were known for learning ReLU
networks of any depth in this setting: previous SQ lower bounds held only for
adversarial noise models (agnostic learning) or restricted models such as
correlational SQ.
Prior work hinted at the impossibility of our result: Vempala and Wilmes
showed that general SQ lower bounds cannot apply to any real-valued family of
functions that satisfies a simple non-degeneracy condition.
To circumvent their result, we refine a lifting procedure due to Daniely and
Vardi that reduces Boolean PAC learning problems to Gaussian ones. We show how
to extend their technique to other learning models and, in many well-studied
cases, obtain a more efficient reduction. As such, we also prove new
cryptographic hardness results for PAC learning two-hidden-layer ReLU networks,
as well as new lower bounds for learning constant-depth ReLU networks from
label queries.Comment: 34 pages, v2: fixed bug in proof from v